A Framework for Effectively Integrating Hard and Soft Syntactic Rules into Phrase Based Translation
نویسندگان
چکیده
In adding syntactic knowledge into phrase-based translation, using hard or soft syntactic rules to reorder the source-language aiming to closely approximate the targetlanguage word order has been successful in improving translation quality. However, it suffers from propagating the pre-reordering errors to the later translation step (decoding). In this paper, we propose a novel framework to integrate hard and soft syntactic rules into phrase-based translation more effectively. For a source sentence to be translated, hard or soft syntactic rules are first acquired from the source parse tree prior to translation, and then instead of reordering the source sentence directly, the rules are used as a strong feature integrated into our elaborately designed model to help phrase reordering in the decoding stage. The experiments on NIST Chinese-to-English translation show that our approach, whether incorporating hard or soft rules, significantly outperforms the previous methods.
منابع مشابه
A unified approach for effectively integrating source-side syntactic reordering rules into phrase-based translation
Phrase-based translation models, with sequences of words (phrases) as translation units, achieve state-of-the-art translation performance. However, phrase reordering is a major challenge for this model. Recently, researchers have focused on utilizing syntax to improve phrase reordering. In adding syntactic knowledge into phrase reordering model, using handcrafted or probabilistic syntactic rule...
متن کاملA Unified Model for Soft Linguistic Reordering Constraints in Statistical Machine Translation
This paper explores a simple and effective unified framework for incorporating soft linguistic reordering constraints into a hierarchical phrase-based translation system: 1) a syntactic reordering model that explores reorderings for context free grammar rules; and 2) a semantic reordering model that focuses on the reordering of predicate-argument structures. We develop novel features based on b...
متن کاملمدل ترجمه عبارت-مرزی با استفاده از برچسبهای کمعمق نحوی
Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...
متن کاملFine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models
Title of dissertation: Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models Yuval Marton, Doctor of Philosophy, 2009 Dissertation directed by: Professor Philip Resnik, Department of Linguistics and Institute for Advanced Computer Studies This dissertation focuses on effective combination of data-driven natural language processing (NLP) approaches with lingu...
متن کاملFactored Soft Source Syntactic Constraints for Hierarchical Machine Translation
This paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach k...
متن کامل